FOIL-D: Efficiently Scaling FOIL for Multi-relational Data Mining of Large Datasets
نویسندگان
چکیده
Multi-relational rule mining is important for knowledge discovery in relational databases as it allows for discovery of patterns involving multiple relational tables. Inductive logic programming (ILP) techniques have had considerable success on a variety of multi-relational rule mining tasks, however, most ILP systems do not scale to very large datasets. In this paper we present two extensions to a popular ILP system, FOIL, that improve its scalability. (i) We show how to interface FOIL directly to a relational database management system. This enables FOIL to run on data sets that previously had been out of its scope. (ii) We describe estimation methods, based on histograms, that significantly decrease the computational cost of learning a set of rules. We present experimental results that indicate that on a set of standard ILP datasets, the rule sets learned using our extensions are equivalent to those learned with standard FOIL but at considerably less cost.
منابع مشابه
An approach to mining the multi-relational imbalanced database
The class imbalance problem is an important issue in classification of Data mining. For example, in the applications of fraudulent telephone calls, telecommunications management, and rare diagnoses, users would be more interested in the minority than the majority. Although there are many proposed algorithms to solve the imbalanced problem, they are unsuitable to be directly applied on a multire...
متن کاملTowards Structural Logistic Regression: Combining Relational and Statistical Learning
Inductive logic programming (ILP) techniques are useful for analyzing data in multi-table relational databases. Learned rules can potentially discover relationships that are not obvious in "flattened" data. Statistical learners, on the other hand, are generally not constructed to search relational data; they expect to be presented with a single table containing a set of feature candidates. Howe...
متن کاملThree Companions for Rst Order Data Mining
Three companion systems, Claudien, ICL and Tilde, are presented. They use a common representation for examples and hypotheses: each example is represented by a relational database. This contrasts with the classical inductive logic programming systems such as Progol and Foil. It is argued that this representation is closer to attribute value learning and hence more natural. Furthermore, the thre...
متن کاملThree Companions for Data Mining in Rst Order Logic
Three companion systems, Claudien, ICL and Tilde, are presented. They use a common representation for examples and hypotheses: each example is represented by a relational database. This contrasts with the classical inductive logic programming systems such as Progol and Foil. It is argued that this representation is closer to attribute value learning and hence more natural. Furthermore, the thre...
متن کاملComparison of Three Parallel Implementations of an Induction Algorithm
Recently, researchers have tried to apply ILP to KDD because ILP enlarges the applicability of Machine Learning to cover KDD and Data Mining: it enables them to learn from multiple relational tables. Many scienti c discovery systems are motivated from the desire to deal with larger databases. However the larger the databases are, the more computational power we need. Parallel computing is a pos...
متن کامل